Flexible Parameter Tying for Conversational Speech Recognition
نویسندگان
چکیده
Modeling pronunciation variation is key for recognizing conversational speech. Previous efforts on pronunciation modeling by modifying dictionaries only yielded marginal improvement. Due to complex interaction between dictionaries and acoustic models, we believe a pronunciation modeling scheme is plausible only when closely coupled with the underlying acoustic model. This paper explores the use of flexible parameter tying for pronunciation modeling. In particular, two new techniques are investigated: Gaussian tying and flexible tree clustering. We report a 1.3% absolute WER improvement over the traditional modeling framework on the Switchboard task.
منابع مشابه
Parameter tying for flexible speech recognition
This paper presents two parameter tying techniques which enable a trade-off between computational cost and recognition performances of a speaker independent flexible speech recognition system working over the telephone network. Parameter tying is conducted at phonetic and acoustic levels. At the phonetic level, allophone and triphone based phonetic modeling are used simultaneously to achieve th...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملEnhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...
متن کاملRecognizing Sloppy Speech
As speech recognition moves from labs into the real world, the sloppy speech problem emerges as a major challenge. Sloppy speech, or conversational speech, refers to the speaking style people typically use in daily conversations. The recognition error rate for sloppy speech has been found to double that of read speech in many circumstances. Previous work on sloppy speech has focused on modeling...
متن کاملTree-structured models of parameter dependence for rapid adaptation in large vocabulary conversational speech recognition
Two models of statistical dependence between acoustic model parameters of a large vocabulary conversational speech recognition (LVCSR) system are investigated for the purpose of rapid speakerand environment-adaptation from a very small amount of speech: (i) a Gaussian multiscale process governed by a stochastic linear dynamical system on a tree, and (ii) a simple hierarchical treestructured pri...
متن کامل